2,113 research outputs found

    Reducing redundancy of real time computer graphics in mobile systems

    Get PDF
    The goal of this thesis is to propose novel and effective techniques to eliminate redundant computations that waste energy and are performed in real-time computer graphics applications, with special focus on mobile GPU micro-architecture. Improving the energy-efficiency of CPU/GPU systems is not only key to enlarge their battery life, but also allows to increase their performance because, to avoid overheating above thermal limits, SoCs tend to be throttled when the load is high for a large period of time. Prior studies pointed out that the CPU and especially the GPU are the principal energy consumers in the graphics subsystem, being the off-chip main memory accesses and the processors inside the GPU the primary energy consumers of the graphics subsystem. First, we focus on reducing redundant fragment processing computations by means of improving the culling of hidden surfaces. During real-time graphics rendering, objects are processed by the GPU in the order they are submitted by the CPU, and occluded surfaces are often processed even though they will end up not being part of the final image. When the GPU realizes that an object or part of it is not going to be visible, all activity required to compute its color and store it has already been performed. We propose a novel architectural technique for mobile GPUs, Visibility Rendering Order (VRO), which reorders objects front-to-back entirely in hardware to maximize the culling effectiveness of the GPU and minimize overshading, hence reducing execution time and energy consumption. VRO exploits the fact that the objects in graphics animated applications tend to keep its relative depth order across consecutive frames (temporal coherence) to provide the feeling of smooth transition. VRO keeps visibility information of a frame, and uses it to reorder the objects of the following frame. VRO just requires adding a small hardware to capture the visibility information and use it later to guide the rendering of the following frame. Moreover, VRO works in parallel with the graphics pipeline, so negligible performance overheads are incurred. We illustrate the benefits of VRO using various unmodified commercial 3D applications for which VRO achieves 27% speed-up and 14.8% energy reduction on average. Then, we focus on avoiding redundant computations related to CPU Collision Detection (CD). Graphics applications such as 3D games represent a large percentage of downloaded applications for mobile devices and the trend is towards more complex and realistic scenes with accurate 3D physics simulations. CD is one of the most important algorithms in any physics kernel since it identifies the contact points between the objects of a scene and determines when they collide. However, real-time accurate CD is very expensive in terms of energy consumption. We propose Render Based Collision Detection (RBCD), a novel energy-efficient high-fidelity CD scheme that leverages some intermediate results of the rendering pipeline to perform CD, so that redundant tasks are done just once. Comparing RBCD with a conventional CD completely executed in the CPU, we show that its execution time is reduced by almost three orders of magnitude (600x speedup), because most of the CD task of our model comes for free by reusing the image rendering intermediate results. Although not necessarily, such a dramatic time improvement may result in better frames per second if physics simulation stays in the critical path. However, the most important advantage of our technique is the enormous energy savings that result from eliminating a long and costly CPU computation and converting it into a few simple operations executed by a specialized hardware within the GPU. Our results show that the energy consumed by CD is reduced on average by a factor of 448x (i.e., by 99.8\%). These dramatic benefits are accompanied by a higher fidelity CD analysis (i.e., with finer granularity), which improves the quality and realism of the application.El objetivo de esta tesis es proponer técnicas efectivas y originales para eliminar computaciones inútiles que aparecen en aplicaciones gráficas, con especial énfasis en micro-arquitectura de GPUs. Mejorar la eficiencia energética de los sistemas CPU/GPU no es solo clave para alargar la vida de la batería, sino también incrementar su rendimiento. Estudios previos han apuntado que la CPU y especialmente la GPU son los principales consumidores de energía en el sub-sistema gráfico, siendo los accesos a memoria off-chip y los procesadores dentro de la GPU los principales consumidores de energía del sub-sistema gráfico. Primero, nos hemos centrado en reducir computaciones redundantes de la fase de fragment processing mediante la mejora en la eliminación de superficies ocultas. Durante el renderizado de gráficos en tiempo real, los objetos son procesados por la GPU en el orden en el que son enviados por la CPU, y las superficies ocultas son a menudo procesadas incluso si no no acaban formando parte de la imagen final. Cuando la GPU averigua que el objeto o parte de él no es visible, toda la actividad requerida para computar su color y guardarlo ha sido realizada. Proponemos una técnica arquitectónica original para GPUs móviles, Visibility Rendering Order (VRO), la cual reordena los objetos de delante hacia atrás por completo en hardware para maximizar la efectividad del culling de la GPU y así minimizar el overshading, y por lo tanto reducir el tiempo de ejecución y el consumo de energía. VRO explota el hecho de que los objetos de las aplicaciones gráficas animadas tienden a mantener su orden relativo en profundidad a través de frames consecutivos (coherencia temporal) para proveer animaciones con transiciones suaves. Dado que las relaciones de orden en profundidad entre objetos son testeadas en la GPU, VRO introduce costes mínimos en energía. Solo requiere añadir una pequeña unidad hardware para capturar la información de visibilidad. Además, VRO trabaja en paralelo con el pipeline gráfico, por lo que introduce costes insignificantes en tiempo. Ilustramos los beneficios de VRO usango varias aplicaciones 3D comerciales para las cuales VRO consigue un 27% de speed-up y un 14.8% de reducción de energía en media. En segundo lugar, evitamos computaciones redundantes relacionadas con la Detección de Colisiones (CD) en la CPU. Las aplicaciones gráficas animadas como los juegos 3D representan un alto porcentaje de las aplicaciones descargadas en dispositivos móviles y la tendencia es hacia escenas más complejas y realistas con simulaciones físicas 3D precisas. La CD es uno de los algoritmos más importantes entre los kernel de físicas dado que identifica los puntos de contacto entre los objetos de una escena. Sin embargo, una CD en tiempo real y precisa es muy costosa en términos de consumo energético. Proponemos Render Based Collision Detection (RBCD), una técnica energéticamente eficiente y preciso de CD que utiliza resultados intermedios del rendering pipeline para realizar la CD. Comparando RBCD con una CD convencional completamente ejecutada en la CPU, mostramos que el tiempo de ejecución es reducido casi tres órdenes de magnitud (600x speedup), porque la mayoría de la CD de nuestro modelo reusa resultados intermedios del renderizado de la imagen. Aunque no es así necesariamente, esta espectacular en tiempo puede resultar en mejores frames por segundo si la simulación de físicas está en el camino crítico. Sin embargo, la ventaja más importante de nuestra técnica es el enorme ahorro de energía que resulta de eliminar las largas y costosas computaciones en la CPU, sustituyéndolas por unas pocas operaciones ejecutadas en un hardware especializado dentro de la GPU. Nuestros resultados muestran que la energía consumida por la CD es reducidad en media por un factor de 448x. Estos dramáticos beneficios vienen acompañados de una mayor fidelidad en la CD (i.e. con granularidad más fina)Postprint (published version

    On the influence of the thickness of the sediment moving layer in the definition of the bedload transport formula in Exner systems

    Get PDF
    In this paper we study Exner system and introduce a modified general definition for bedload transport flux. The new formulation has the advantage of taking into account the thickness of the sediment layer which avoids mass conservation problems in certain situations. Moreover, it reduces to a classical solid transport discharge formula in the case of quasi-uniform regime. We also present several numerical tests where we compare the proposed sediment transport formula with the classical formulation and we show the behavior of the new model in different configurations

    Visibility rendering order: Improving energy efficiency on mobile GPUs through frame coherence

    Get PDF
    During real-time graphics rendering, objects are processed by the GPU in the order they are submitted by the CPU, and occluded surfaces are often processed even though they will end up not being part of the final image, thus wasting precious time and energy. To help discard occluded surfaces, most current GPUs include an Early-Depth test before the fragment processing stage. However, to be effective it requires that opaque objects are processed in a front-to-back order. Depth sorting and other occlusion culling techniques at the object level incur overheads that are only offset for applications having substantial depth and/or fragment shading complexity, which is often not the case in mobile workloads. We propose a novel architectural technique for mobile GPUs, Visibility Rendering Order (VRO), which reorders objects front-to-back entirely in hardware by exploiting the fact that the objects in graphics animated applications tend to keep its relative depth order across consecutive frames (temporal coherence). Since order relationships are already tested by the Depth Test, VRO incurs minimal energy overheads because it just requires adding a small hardware to capture that information and use it later to guide the rendering of the following frame. Moreover, unlike other approaches, this unit works in parallel with the graphics pipeline without any performance overhead. We illustrate the benefits of VRO using various unmodified commercial 3D applications for which VRO achieves 27% speed-up and 14.8% energy reduction on average over a state-of-the-art mobile GPU.Peer ReviewedPostprint (author's final draft

    Rendering Elimination: Early Discard of Redundant Tiles in the Graphics Pipeline

    Full text link
    GPUs are one of the most energy-consuming components for real-time rendering applications, since a large number of fragment shading computations and memory accesses are involved. Main memory bandwidth is especially taxing battery-operated devices such as smartphones. Tile-Based Rendering GPUs divide the screen space into multiple tiles that are independently rendered in on-chip buffers, thus reducing memory bandwidth and energy consumption. We have observed that, in many animated graphics workloads, a large number of screen tiles have the same color across adjacent frames. In this paper, we propose Rendering Elimination (RE), a novel micro-architectural technique that accurately determines if a tile will be identical to the same tile in the preceding frame before rasterization by means of comparing signatures. Since RE identifies redundant tiles early in the graphics pipeline, it completely avoids the computation and memory accesses of the most power consuming stages of the pipeline, which substantially reduces the execution time and the energy consumption of the GPU. For widely used Android applications, we show that RE achieves an average speedup of 1.74x and energy reduction of 43% for the GPU/Memory system, surpassing by far the benefits of Transaction Elimination, a state-of-the-art memory bandwidth reduction technique available in some commercial Tile-Based Rendering GPUs

    Dynamic sampling rate: harnessing frame coherence in graphics applications for energy-efficient GPUs

    Get PDF
    In real-time rendering, a 3D scene is modelled with meshes of triangles that the GPU projects to the screen. They are discretized by sampling each triangle at regular space intervals to generate fragments which are then added texture and lighting effects by a shader program. Realistic scenes require detailed geometric models, complex shaders, high-resolution displays and high screen refreshing rates, which all come at a great compute time and energy cost. This cost is often dominated by the fragment shader, which runs for each sampled fragment. Conventional GPUs sample the triangles once per pixel; however, there are many screen regions containing low variation that produce identical fragments and could be sampled at lower than pixel-rate with no loss in quality. Additionally, as temporal frame coherence makes consecutive frames very similar, such variations are usually maintained from frame to frame. This work proposes Dynamic Sampling Rate (DSR), a novel hardware mechanism to reduce redundancy and improve the energy efficiency in graphics applications. DSR analyzes the spatial frequencies of the scene once it has been rendered. Then, it leverages the temporal coherence in consecutive frames to decide, for each region of the screen, the lowest sampling rate to employ in the next frame that maintains image quality. We evaluate the performance of a state-of-the-art mobile GPU architecture extended with DSR for a wide variety of applications. Experimental results show that DSR is able to remove most of the redundancy inherent in the color computations at fragment granularity, which brings average speedups of 1.68x and energy savings of 40%.This work has been supported by the the CoCoUnit ERC Advanced Grant of the EU’s Horizon 2020 program (Grant No. 833057), Spanish State Research Agency (MCIN/AEI) under Grant PID2020-113172RB-I00, the ICREA Academia program, and the Generalitat de Catalunya under Grant FI-DGR 2016. Funding was provided by Ministerio de Economía, Industria y Competitividad, Gobierno de España (Grant No. TIN2016-75344-R).Peer ReviewedPostprint (published version

    Desarrollo de panuveítis por tuberculosis en paciente con esclerosis múltiple tratado con interferón beta

    Get PDF
    ResumenPresentamos el caso de un ex toxicómano de 50 años que acudió a consulta de oftalmología con visión borrosa bilateral. En la anamnesis refería su desintoxicación 20 años atrás, con abstinencia total desde entonces y buena reinserción social. Su historia clínica reflejaba serología de hepatitis B+ y C+, pero VIH− y recientemente comenzó con esclerosis múltiple, controlada con interferón beta. Señalaba haber padecido últimamente un resfriado invernal prolongado.En la exploración oftalmológica se apreciaba una uveítis bilateral con precipitados queráticos. En funduscopia del ojo derecho destacaban múltiples granulomas coroideos y zonas de vasculitis periférica, activas y cicatriciales. La funduscopia del ojo izquierdo era normal.Tras diversas pruebas diagnósticas, como radiografía torácica y TAC pulmonar, se detectaron varios nódulos pulmonares. Asimismo, se realizó una prueba de Mantoux que resultó fuertemente positiva. La analítica mostraba neutropenia y linfopenia importantes. Por todo ello se hizo un diagnóstico de presunción de panuveítis secundaria a tuberculosis sistémica.Se administraron fármacos antituberculosos y corticoides sistémicos, con buena respuesta clínica sistémica y ocular.El interferón beta 1b es un inmunomodulador apropiado para la esclerosis múltiple, pero sus principales efectos secundarios son alteraciones analíticas como leucopenia, linfocitopenia y trombocitopenia.Los linfocitos CD4+ T, leucocitos, macrófagos y granulocitos, con la producción de sus mediadores interferón gamma, IL-12 o TNF-α son fundamentales para controlar al Mycobacterium tuberculosis. Por ello, antes de introducir Interferón beta 1b, convendría realizar técnicas de screening, como la prueba de Mantoux o el interferon gamma release assay–(quantiferon-TB) para detectar posibles tuberculosis latentes potencialmente activables.AbstractAn ex-drug addict 50 years-old man came to our hospital with bilateral blurred vision. The anamnesis revealed his detoxification 20 years ago, with complete drug abstinence since then and a good social reinsertion. His medical history showed a serology for Hepatitis B+ and C+ viruses, but it showed VIH−. Recently, he began to suffer multiple sclerosis disease, but it was well-controlled with interferon-beta. He also mentioned to have suffered a recent, prolonged winter cold.The ophthalmologic examination enables identification of bilateral uveitis with keratic precipitates. The right eye funduscopy revealed several choroidal granulomas, besides healed and actives peripheral vasculitis zones. The left eye funduscopy was normal.After several diagnostic tests as chest x-ray and pulmonary CAT scan, several pulmonary nodules were found. Also, it was carried out a Mantoux-test that was strongly positive. The blood analysis showed neutropenia and lymphopenia. Consequently, we proposed a presumptive diagnosis of panuveitis related to systemic tuberculosis.Anti-tuberculosis drugs and systemic corticoids were given, with a good clinical and ocular response.Interferon-beta 1b is a suitable immunomodulator for multiple sclerosis treatment, but its main secondary effects are blood abnormalities as leukopenia, lymphocytopenia and thrombocytopenia.CD4+ T lymphocytes, leukocytes, macrophages and granulocytes, furthermore the production of its mediators: interferon gamma, IL-12 or TNF-α, are essential to control Mycobacterium tuberculosis. Thus, before introduction of interferon-beta 1b, it would be advisable to attempt screening techniques, as Mantoux-test or interferon gamma release assay (quantiferon-TB) to detect probably latent tuberculosis, potentially likely to be active again

    Improving the energy efficiency of the graphics pipeline by reducing overshading

    Get PDF
    The most common task of GPUs is to render images in real time. When rendering a 3D scene, a key step is determining which parts of every object are visible in the final image. There are different approaches to solve the visibility problem, the Z-Test being the most common in modern GPUs. A main factor that significantly penalizes the energy efficiency of a GPU, especially in the mobile arena, is the so-called overshading, which happens when a portion of an object is shaded and rendered but finally occluded by another object. This useless work results in a waste of energy, however, the conventional Z-Test only eliminates a fraction of it. In this paper we present a novel microarchitectural technique, the ¿-Test, to drastically reduce overshading on a Tile-Based Rendering (TBR) architecture. The proposed approach leverages frame-to-frame coherence by taking advantage of the costly and valuable calculations made in previous frames. In particular, we propose to reuse information from the Z-Buffer of the previous frame, which is currently discarded. We make the observation that due to the existing frame-to-frame coherence, the Z-Buffer of a frame will have a high similarity in many areas with that of the previous frame. As a result, the proposed technique avoids many costly computations and off-chip memory accesses. Our experimental evaluation shows that ¿-Test reduces the average energy consumption of the overall GPU/Memory system by 15.7 % and the runtime of the evaluated benchmarks by 10.6 % on average.This work has been supported by the the CoCoUnit ERC Advanced Grant of the EU’s Horizon 2020 program (grant No 833057), the Spanish State Research Agency under grant TIN2016-75344-R (AEI/-FEDER, EU) and the ICREA Academia program. D. Corbal´an-Navarro has been supported by a PhD research fellowship from the University of Murcia.Peer ReviewedPostprint (author's final draft

    Sistemas de geolocalización para el mando y control en el combate en el subsuelo en el horizonte 2035

    Get PDF
    El objetivo de este trabajo es analizar posibles soluciones al problema que para el mando y control supone la geolocalización de unidades de infantería en el combate en subsuelo, siendo éste uno de los escenarios más habituales contemplados en el empleo de las fuerzas terrestres en el horizonte 2035. Para el cumplimiento de este objetivo es necesaria la consecución de una serie de objetivos parciales que se exponen a continuación:• Conocer el planteamiento que se tiene referente a combate subterráneo en el horizonte 2035 y determinar las necesidades de geolocalización que el mando y control de unidades tiene en este escenario. • Conocer los sistemas de geolocalización en ambiente cerrados que actualmente hay en el mercado. • Analizar las ventajas y limitaciones de los sistemas de geolocalización encontrados para su uso en el mando y control de unidades de infantería en combate en subsuelo. <br /
    corecore